Scaling Smoothed Language Models

نویسندگان

  • Amparo Varona
  • M. Inés Torres
چکیده

In Continuous Speech Recognition (CSR) systems a Language Model (LM) is required to represent the syntactic constraints of the language. Then a smoothing technique needs to be applied to avoid null LM probabilities. Each smoothing technique leads to a different LM probability distribution. Test set perplexity is usually used to evaluate smoothing techniques but the relationship with acoustic models is not taken into account. In fact, it is well-known that to obtain optimum CSR performances a scaling exponential parameter must be applied over LMs in the Bayes’ rule. This scaling factor implies a new redistribution of smoothed LM probabilities. The shape of the final probability distribution is due to both the smoothing technique used when designing the language model and the scaling factor required to get the optimum system performance when integrating the LM into the CSR system. The main object of this work is to study the relationship between the two factors, which result in dependent effects. Experimental evaluation is carried out over two Spanish speech application tasks. Classical smoothing techniques representing very different degrees of smoothing are compared. A new proposal, Delimited discounting, is also considered. The results of the experiments showed a strong dependence between the amount of smoothing given by the smoothing technique and the way that the LM probabilities need to be scaled to get the best system performance, which is perplexity independent in many cases. This relationship is not independent of the task and available training data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating High and Low Smoothed LMs in a CSR System

1 In Continuous Speech Recognition (CSR) systems, acoustic and Language Models (LM) must be integrated. To get optimum CSR performances, it is well-known that heuristic factors must be optimised. Due to its great effect on final CSR performances, the exponential scaling factor applied to LM probabilities is the most important. LM probabilities are obtained after applying a smoothing technique. ...

متن کامل

Implicational Scaling of Reading Comprehension Construct: Is it Deterministic or Probabilistic?

In English as a Second Language Teaching and Testing situations, it is common to infer about learners’ reading ability based on his or her total score on a reading test. This assumes the unidimensional and reproducible nature of reading items. However, few researches have been conducted to probe the issue through psychometric analyses. In the present study, the IELTS exemplar module C (1994) wa...

متن کامل

A new shape retrieval method using the Group delay of the Fourier descriptors

In this paper, we introduced a new way to analyze the shape using a new Fourier based descriptor, which is the smoothed derivative of the phase of the Fourier descriptors. It is extracted from the complex boundary of the shape, and is called the smoothed group delay (SGD). The usage of SGD on the Fourier phase descriptors, allows a compact representation of the shape boundaries which is robust ...

متن کامل

Consistent estimation of a mean planar curve modulo similarities

We consider the problem of estimating a mean planar curve from a set of J random planar curves observed on a k-points deterministic design. We study the consistency of a smoothed Procrustean mean curve when the observations obey a deformable model including some nuisance parameters such as random translations, rotations and scaling. The main contribution of the paper is to analyze the influence...

متن کامل

Scaling and Fractal Concepts in Saturated Hydraulic Conductivity: Comparison of Some Models

Measurement of soil saturated hydraulic conductivity, Ks, is normally affected by flow patterns such as macro pore; however, most current techniques do  not differentiate flow types, causing major problems in describing water and chemical flows within the soil matrix. This study compares eight models for scaling Ks and predicted matrix and macro pore Ks, using a database composed of 50 datasets...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • I. J. Speech Technology

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2005